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Abstract —Distribution matching transforms independent and 
Bernoulli (I) distributed input bits into a sequence of output sym¬ 
bols with a desired distribution. Fixed-to-flxed length, invertible, 
and low complexity encoders and decoders based on constant 
composition and arithmetic coding are presented. Asymptotically 
in the blocklength, the encoder achieves the maximum rate, 
namely the entropy of the desired distribution. Furthermore, the 
normalized divergence of the encoder output and the desired 
distribution goes to zero in the blocklength. 












I. Introduction 

A distribution matcher transforms independent 
Bernoulli(i) distributed input bits into output symbols 
with a desired distribution. We measure the distance between 
the matcher output distribution and the desired distribution by 
normalized informational divergence p. 7]. Informational 
divergence is also known as Kullback-Leibler divergence 
or relative entropy ||^ Sec. 2.3]. A dematcher performs the 
inverse operation and recovers the input bits from the output 
symbols. A distribution matcher is a building block of the 
bootstrap scheme ||^ that achieves the capacity of arbitrary 
discrete memoryless channels Q. Distribution matchers are 
used in l|^ Sec. VI] for rate adaption and in |]^ to achieve 
the capacity of the additive white Gaussian noise channel. 

Prehx-free distribution matching was proposed in |j^ 
Sec. IV.A]. In ||^, Q Huffman codes are used for matching. 
Optimal variable-to-hxed and hxed-to-variable length distri¬ 
bution matchers are proposed in fTO) and E). respectively. 
The codebooks of the matchers in |[8 r-E) must be generated 
offline and stored. This is infeasible for large codeword 
lengths, which are necessary to achieve the maximum rate. 
This problem is solved in E),E) by using arithmetic coding 
to calculate the codebook online. The matchers proposed in 
E)’0 are asymptotically optimal. All approaches 0-0 
are variable length, which can lead to varying transmission 
rate, large buffer sizes, error propagation and synchronization 
problems |[^ Sec. I]. Fixed-to-fixed (f2f) length codes do 
not have these issues. The author of El Sec. 4.8] suggests 
to concatenate short codes and the authors of Q employ 
a forward error correction decoder to build an f2f length 
matcher. The dematchers of E cannot always recover 
the input sequence with zero error. Hence systematic errors 
are introduced that cannot be corrected by the error correction 
code or by retransmission. The thesis E proposes an invert¬ 
ible f2f length distribution matcher called adaptive arithmetic 
distribution matcher (aadm). The algorithm is computationally 
complex. 
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Fig. 1. Matching a data block B"* = Bi. . .Bm to output symbols A" = 
Ai...A„ and reconstructing the original sequence at the dematcher. The rate 
is — . The matcher can be interpreted as emulating a discrete 

n[outputsymbolJ e & 

memory less source Pa- 


In this work we propose practical, invertible, f2f length 
distribution matchers. They are asymptotically optimal and are 
based on constant composition codes indexed by arithmetic 
coding. The paper is organized as follows. In Section |n] we 
formally dehne distribution matching. We analyze constant 
composition codes in Section III In Section IV we show how 
a constant composition distribution matcher (ccdm) and de¬ 
matcher can be implemented efficiently by arithmetic coding. 


H. Problem statement 

The entropy of a discrete random variable A with alphabet 
A and distribution Pa is 

I0I(A)= X --PA(a)log2PA(a) (1) 

aGsupp(PA) 

where supp(Pa) C ^ is the support of Pa. The informational 
divergence of two distributions on A is 

»(^aII^a)= E ^A(«)log2^- (2) 

aGsupp(P^) 

The normalized informational divergence for length n random 
vectors A" = Ai.. .A„ and A" is dehned as 

n 

For random vectors with independent and identically dis¬ 
tributed (iid) entries, we write 

71 

^A(a”) = n^A(a.). (4) 

i=l 

A one-to-one f2f distribution matcher is an invertible func¬ 
tion /. We denote the inverse function by f~^. The mapping 
imitates a desired distribution Pa by mapping m Bernoulli(2) 
distributed bits to length n strings A" = /(P™) € -4". 
The output distribution is P^„. The concept of one-to-one f2f 
distribution matching is illustrated in Fig. 


















2 


Definition 1. A matching rate R = m/n is achievable for a 
distribution P/\ if for any a > 0 and sufficiently large n there 
is an invertible mapping f: {0,1}™ —> for which 


iPm^)\\pr, 


< a. 


(5) 


The following proposition in |fT6t relates the rate R and (|^. 


Proposition 1 (Converse, 116 Proposition 8]). There exists a 
positive-valued function /3 with 


P{a) 0 


such that (0 implies 


TO ^ IHI (A) 
- H(B) 


■/3(a). 


( 6 ) 


(7) 


Proposition[T]bounds the maximum rate that can be achieved 
under condition (|^. Since ]HI(B) = 1 we have 

i?<H(A) (8) 

for any achievable rate R. 

III. Constant Composition Distribution Matching 
The empirical distribution of a vector c of length n is 
defined as 

na{c) 


7A,c(a) := 


n 


(9) 


where na{c) = \{i : Ci = a}\ is the number of times symbol 
a appears in c. The authors of RT] Sec. 2.1] call ^ the type 
of c. An n-type is a type basea on a length n sequence. A 
codebook (^cdm ^ A'^ is called a constant composition code if 
all codewords are of the same type, i.e., na{c) does not depend 
on the codeword c. We will write Ua in place of na(c) for a 
constant composition code. 


A. Approach 

We use a constant composition code with « P^n. As 
all Ua need to be integers and add up to n, there are multiple 
possibilities to choose the We use the allocation that solves 

Pa = argmin D (P^, | |Pa) 

^A' (10) 

subject to P^, is n-type. 

The solution of ( fTOl i can be found efficiently by RS] Algo¬ 
rithm 2]. Suppose the output length n is fixed and that we can 
choose the input length to. Let be the set of vectors of 
type P^, i.e., we have 

rf\={v\vGA^,'^^=P-^{a) Vaeyf}. (11) 

The matcher is invertible, so we need at least as many code¬ 
words as input blocks. The input blocklength must thus not 
exceed log 2 \Tp^\- We set the input length to to = [log 2 \Tft |J 
and we define the encoding function 

/ccd.n:{0,ir^r/?_. (12) 

The actual mapping f^dm can be implemented efficiently by 
arithmetic coding, as we will show in Section |IV] The constant 
composition codebook is now given by the image of Xcdm, i-C-, 

acd.n = /ccdm({0,ir). (13) 

Since ^cdm is invertible, the codebook size is Ificdml = 2"*. 


B. Analysis 

We show that ^cdm asymptotically achieves all rates satis¬ 
fying ID- We can bound m by 


m = 


iog 2 ir^jJ > iog 2 ir^j -1. (14) 


Recall that the matcher output distribution is P^„ ■ We have 

2-™ Pa(o”) 


(^^a^II^a)= e 


^CccdmQT"^ 


= ^{Pa^\\Pa)+ E 


2-™log2 


CL^ G Ctcdm Q 


= D (P^„ I \P^) + |(;.dm|2-™ E ^7 


tee 

P”(a") 

T^A(a) 

0 


= D(P^„||P?) +nD(PA||PA). 


(15) 


Term 1 

For Term 1 we obtain 


Term 2 


e^ii^a)= E 


2 -'"log 2 - 

GCccdm ^ 

^ 2-^ 

= E2 '"log2^z7i(^ 

Ccdm 

= nlHI(A) — TO. 


n p-Ai^y 


Using ( [T^ in CD and dividing by n we have 

= H(A)-P-f D(Pa||Pa). 


(T^a^IIE 


(16) 


(17) 


The choice of Pa guarantees (see m Proposition 4]) 
that for the third term in R7] i we have 

» (PaIIT^a) < --: ■■ 2 ' (18) 

mm F/\[a)n 

aGsupp Pa 

where k = |,A| is the alphabet size. Consequently, we know 
that this term vanishes as the blocklength approaches infinity, 
i.e., we have 

lim D(Pa||Pa) = 0. (19) 

n—^oc 

We now relate the input and output lengths to understand the 
asymptotic behavior of the rate. By GZl Lemma 2.2], we have 

''n-i- k — P 
k-1 

Taking the logarithm to the base 2 and dividing by n we have 


\ry\> 


2"H(a) > („ + (20) 


|7?J ^ -fclog 2 (n-ffc) ^^ 


( 21 ) 


For the rate, we obtain 


R= — 
n 


1 

n 


(p log2 ITpJ 

~ n 

f 2 L£fcill±L+H(A) 

n ^ ^ 

and in the asymptotic case 

lim P = El (a) . 


1 

n 


( 22 ) 


(23) 
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Fig. 2. Normalized divergence and rate of ccdm over output blocklength for 
= (0.0722,0.1654,0.3209,0.4415). For comparison, the performance 
of optimal f2f (H Sec. 4.4] and aadm Gg is displayed. Because of limited 
computational resources, we could calculate the performance of optimal f2f 
only up to a blocklength of n = 90. 


From ( [T9l l and jl6 Proposition 6] we know that IH (A) —>^ 
H (A) and by ( |19| ) and ( |2^ in ( [TtI i, normalized divergence 
approaches zero for n —>^ oo. 

Example 1. The desired distribution is 

Pa = (0.0722,0.1654,0.3209,0.4415). 

Fig.j^shows the normalized divergences and rates of ccdm and 
the optimal f2f length matcher GH Sec. 4.4]. The empirical 
performance of aadm is also displayed. For optimal f2f 
and aadm, the rate is hxed to ]HI(A) bits per symbol. Observe 
that the ccdm needs about 4 times the blocklength of the 
optimal scheme to reach an informational divergence of 0.06 
bits per symbol. However, the memory for storing the optimal 
codebook grows exponentially in m. For n = 10, we already 
need about 10240 bits = 1.25 kB; for n = 100 we would need 
1.441 X 10^® TB of memory. In this example, ccdm performs 
better than aadm for short blocklength up to 100 symbols. 
Fig. 1^ also shows the lower and upper bounds (|^ and ( |22l l, 
respectively. 


IV. Arithmetic Coding 

We use arithmetic coding for indexing sequences efficiently. 
Our arithmetic encoder associates an interval to each input 
sequence in {0,1}”* and it associates an interval to each output 
sequence in see Fig. |3| for an example. The size of an 
interval is equal to the proBability of the corresponding se¬ 
quence according to the input and output model, respectively. 
For the input model we choose an iid Bernoulli(i) process. 
We describe the output model by a random vector 

A" = A 1 A 2 ...K (24) 



1100 

1010 

1001 

0110 

0101 

0011 


j-n 

p^ 


Fig. 3. Diagram of a constant composition arithmetic encoder with ^^(0) = 
Pa(1) = 0.5, m = 2 and n = 4. 


with marginals = Pa ™tl the uniform distribution 


Pa40 = 


ITp" 


Va’ 




The intervals are ordered lexicographically. All input and 
output intervals range from 0 to 1 because all probabilities 
add up to 1. 

Example 2. Fig. shows input and output intervals with 
output length n = 4 and Fa(*^) ~ ^a(I) = There are 
4 equally probable input sequences and 6 equally probable 
output sequences. The intervals on the input side are [0,0.25), 
[0.25,0.5), [0.5,0.75) and [0.75,1). The intervals on the 
output side are [0, |), [|, |), [§, |), [|, |), [|, |) and [|, 1)[] 

The arithmetic encoder can link an output sequence to an 
input sequence if the lower border of the output interval is 
inside the input interval. In the example (Fig.|^ ’00’ may link 
to both ’0101’ and ’0011’, while for ’01’ only a link to ’0110’ 
is possible. There are at most two possible choices because 
by ( [T 4 I 1 the input interval size is less than twice the output 
interval size. Both choices are valid and we can perform an 
inverse operation. In our implementation, the encoder decides 
for the output sequence with the lowest interval border. As 
a result, the codebook C;cdin of Example |^is {’0011’, ’0110, 
’1001’, ’1100’}. In general ({cdm has cardinality 2™ with 2"* < 
rfpj < 2'"+^ according to ( |14| ). It is not possible to index the 
whole set 7)2. unless 2”^ = |TpJ. The analysis of the code 
(Section jlll-Bjl is valid for all codebooks C^cdm C TS. The 


actual subset is implicitly dehned by the arithmetic encoder. 

We now discuss the online algorithm that processes the input 
sequentially. Initially, the input interval spans from 0 to 1. As 
the input model is Bernoulli(2) we split the interval into two 
equally sized intervals and continue with the upper interval in 
case the first input bit is ’1’; otherwise we continue with the 
lower interval. After the next input bit arrives we repeat the last 
step. After m input bits we reach a size 2“"* interval. After 
every rehnement of the input interval the algorithm checks 
for a sure prehx of the output sequence, e.g., in Fig. [^ we 
see that if the input starts with 1 the output must start with 
1. Every time we extend the sure prefix by a new symbol, 
we must calculate the probability of the next symbol given 

'please note that in this case no distribution matcher is needed. However, 
the invertible mapping is of interest in its own right. 
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Fig. 4. Refinement of the output intei'vals. Round brackets indicate symbols 
that must follow with probability one. 


the sure prefix. That means we determine the output intervals 
within the sure interval of the prefix. The model for calculating 
the conditioned probabilities is based on drawing without 
replacement. There is a bag with n symbols of k discriminable 
kinds. Ua denotes how many symbols of kind a are initially 
in the bag and is the current number. The probability to 
draw a symbol of type a is n'g^ln. If we pick a symbol a both 
n and n'^ decrement by 1. 

Example 3. Fig. shows a refinement of the output intervals. 
Initially there are 2 ’O’s and 2 ’I’s in the bag. The distribution 
of the first drawn symbol is Tai( 0) = ^Ai(l) = 5- When 
drawing a ’O’, there are 3 symbols remaining: one ’0’ and 
two ’I’s. Thus, the probability for a ’0’ reduces to 1/3 while 
the probability of ’1’ is 2/3. If two ’O’s were picked, two ’I’s 
must follow. This way we ensure that the encoder output is 
of the desired type. Observe that the probabilities of the next 
symbol conditioned on the previous symbols are unequal in 
general, i.e, we have 


i"A.|A,(0|0)7^PA.|A,(0|l) (25) 


in general. However, = nr=i'PAi|A»-i(®il®* 
stant on as we show in the following proposition. 

Proposition 2. After n refinements of the output interval the 
model used for the refinement step stated above creates equally 
spaced (equally probable) intervals that are labeled with all 
sequences in Tft. 

Proof. All symbols in the bag are chosen at some point. Con¬ 
sequently only sequences in 7^^ may appear. All possibilities 
associated with the chosen string are products of fractions 
n^/n, where n takes on all values from the initial value to 
1 because every symbol is drawn at some point. Thus for 
each string we obtain for its probability an expression that is 
independent of the realization itself: 




na=ol ■ ■ ■ na=k-l^- 
n\ 



Va" e Tfi-. (26) 


□ 


Numerical problems for representing the input interval and the 
output interval occur after a certain number of input bits. For 
this reason we introduce a rescaling each time a new output 
symbol is known. We explain this next. 



Fig. 5. Scaling of input and output intervals in case the input interval is 
a subset of an output interval. The latter interval corresponds to [0,1) after 
scaling. A star indicates that this is just a prefix of the complete word. Round 
brackets indicate symbols that must follow with probability one. 


A. Scaling input and output intervals 

After we identify a prefix, we are no longer interested in 
code sequences that do not have that prefix. We scale the input 
and output interval such that the output interval is [0,1). Fig.|^ 
illustrates the mapping of intervals (ini, outi) to (in 2 , out 2 ). 
The refinement for the second symbol works as described 
in Example If the second input bit is 0, we know that 
10 must be a prefix of the output. The resulting scaling is 
shown in Fig. as (in 2 , out 2 ) to (ins, outs). A more detailed 
explanation of scaling for arithmetic coding can be found for 
instance in p9] Chap. 4]. We provide an implementation of 
ccdm online y^. 


V. Conclusion 

We presented a practical and invertible f2f length distribu¬ 
tion matcher that achieves the maximum rate asymptotically 
in the blocklength. In contrast to matchers proposed in the 
literature E-m the f2f matcher is robust to synchronization 
and variable rate problems. Error propagation is limited by the 
blocklength. In future work we plan to investigate f2f length 
codes that perform well in the finite blocklength regime. 

VI. Acknowledgment 

We wish to thank Irina Bocharova and Boris Kudryashov 
for encouraging us to work on the presented approach. 

References 

[1] I. Csiszar and J. Komer, Information Theory: Coding Theorems for 
Discrete Memoryless Systems. Cambridge University Press, 2011. 

[2] T. M. Cover and J. A. Thomas, Elements of Information Theory, 2nd ed. 
John Wiley & Sons, Inc., 2006. 

[3] G. Bocherer and R. Mathar, “Operating LDPC codes with zero shaping 
gap,” in Proc. IEEE Inf. Theory Workshop (ITW), 2011. 

[4] M. Mondelli, S. H. Hassani, and R. Urbanke, “How to achieve the ca¬ 
pacity of asymmetric channels,” Proc. Allerton Conf. Commun., Contr., 
Comput., pp. 789-796, Sep. 2014. 

[5] D. MacKay, “Good eiTor-con'ecting codes based on very sparse matri¬ 
ces,” IEEE Trans. Inf. Theory, vol. 45, no. 2, pp. 399—431, 1999. 

[6] G. Bocherer, P. Schulte, and F. Steiner, “Bandwidth efficient and 
rate-compatible low-density paiity-check coded modulation,” arXiv 
preprint, 2015. [Online]. Available: http://arxiv.org/abs/1502.02733 

[7] J. Forney, G., R. Gallager, G. Lang, F. Longstaff, and S. Qureshi, 
“Efficient modulation for band-limited channels,” IEEE J. Sel. Areas 
Commun., vol. 2, no. 5, pp. 632-647, 1984. 

[8] F. R. Kschischang and S. Pasupathy, “Optimal nonuniform signaling for 
Gaussian channels,” IEEE Trans. Inf. Theory, vol. 39, no. 3, pp. 913- 
929, 1993. 















5 


[9] G. Ungerbock, “Huffman shaping,” in Codes, Graphs, and Systems, 
R. Blahut and R. Koetter, Eds. Springer, 2002, ch. 17, pp. 299-313. 

[10] G. Bdcherer and R. Mathar, “Matching dyadic distributions to channels,” 
in Proc. Data Compression Conf., 2011, pp. 23-32. 

[11] R. A. Amjad and G. Bocherer, “Fixed-to-variable length distribution 
matching,” in Proc. IEEE Int. Symp. Inf. Theory (ISIT), 2013, pp. 1511- 
1515. 

[12] N. Cai, S.-W. Ho, and R. Yeung, “Probabilistic capacity and optimal 
coding for asynchronous channel,” in Proc. IEEE Inf. Theory Workshop 
(ITW), 2007, pp. 5^59. 

[13] S. Baur and G. Bocherer, “Arithmetic distribution matching,” in Proc. 
Int. ITG Conf Syst. Commun. Coding, Feb. 2015. 

[14] R. A. Amjad, “Algorithms for simulation of discrete memoryless 
sources,” Master’s thesis, Technische Universitat Miinchen, 2013. 

[15] P. Schulte, “Zero en'or fixed length distribution matching,” Master’s 
thesis, Technische Universitat Miinchen, 2014. 

[16] G. Bocherer and R. A. Amjad, “Informational divergence and entropy 
rate on rooted trees with probabilities,” in Proc. IEEE Int. Symp. Inf. 
Theory (ISIT), Sep. 2014, pp. 176-180. 

[17] I. Csiszar and P. C. Shields, “Information theory and statistics: A 
tutorial,” Foundations and Trends in Commun. Inf. Theory, vol. 1, no. 4, 
pp. 417-528, 2004. 

[18] G. Bocherer and B. C. Geiger, “Optimal quantization for distribution 
synthesis,” arXiv preprint, 2014. [Online]. Available: http://arxiv.org/ 
abs/1307.6843 

[19] K. Sayood, Introduction to data compression. Elsevier, 2006. 

[20] “A fixed-to-fixed length distribution matcher in C/MATEAB.” [Online]. 
Available: http://beam.to/ccdm 


